|     algorithm leetcode string integer math   |    

题目

Implement atoi to convert a string to an integer.

Hint: Carefully consider all possible input cases. If you want a challenge, please do not see below and ask yourself what are the possible input cases.

Notes: It is intended for this problem to be specified vaguely (ie, no given input specs). You are responsible to gather all the input requirements up front.

C语言中对于atoi()函数的约定

这题对对特殊值的处理,沿用C语言中atoi()函数的约定。具体约定如下:

The function first discards as many whitespace characters as necessary until the first non-whitespace character is found. Then, starting from this character, takes an optional initial plus or minus sign followed by as many numerical digits as possible, and interprets them as a numerical value.

The string can contain additional characters after those that form the integral number, which are ignored and have no effect on the behavior of this function.

If the first sequence of non-whitespace characters in str is not a valid integral number, or if no such sequence exists because either str is empty or it contains only whitespace characters, no conversion is performed.

If no valid conversion could be performed, a zero value is returned. If the correct value is out of the range of representable values, INT_MAX (2147483647) or INT_MIN (-2147483648) is returned.

这和JavaInteger.parseInt(int i)函数的约定完全不同,Java中输入不不符合格式,直接抛出NumberFormatException,而C是直接返回0。当大小超出INT_MAXINT_MIN时,Java也是抛NumberFormatException,而C直接返回2147483647-2147483648

朴素正则表达式匹配后再全手动解析

先用正则表达式^[+-]?[0-9]+$把表示数值的部分抠出来。然后手动维护一个char和对应int的映射表。

代码

import java.util.regex.*;

public class Solution {
    private static final Pattern P = Pattern.compile("^[\\s]*([+-]?[0-9]+).*$");
    private static final Map<Character,Integer> DICTIONARY = new HashMap<>();
    static {
        DICTIONARY.put('0',0);
        DICTIONARY.put('1',1);
        DICTIONARY.put('2',2);
        DICTIONARY.put('3',3);
        DICTIONARY.put('4',4);
        DICTIONARY.put('5',5);
        DICTIONARY.put('6',6);
        DICTIONARY.put('7',7);
        DICTIONARY.put('8',8);
        DICTIONARY.put('9',9);
    }
    public int myAtoi(String str) {
        if (str == null || str.isEmpty()) { return 0; }
        Matcher m = P.matcher(str);
        if (!m.find()) { return 0; }

        char[] chars = m.group(1).toCharArray();
        int head = (chars[0] == '+' || chars[0] == '-')? 1:0;
        int signum = (chars[0] == '-')? -1:1;
        long result = 0l;
        long max = (long)Integer.MAX_VALUE;
        long min = (long)Integer.MIN_VALUE;
        for (int i = head; i < chars.length; i++) {
            result = (result * 10) + (DICTIONARY.get(chars[i]) * signum);
            if (result > 0 && result > max) { return Integer.MAX_VALUE; }
            if (result < 0 && result < min) { return Integer.MIN_VALUE; }
        }
        return (int)result;
    }
}

结果

因为很多可以用库完成的转换全手动,还要维护charint的映射表,效率不可能太好。 string-to-integer-1

利用ASCII码直接转码

ASCII码中,0的编码是480-9分别对应48-57。利用这个可以直接从char转码到int。就不用维护一个映射表了。

代码

import java.util.regex.*;

public class Solution {
    private static final Pattern P = Pattern.compile("^[\\s]*([+-]?[0-9]+).*$");

    public int myAtoi(String str) {
        if (str == null || str.isEmpty()) { return 0; }
        Matcher m = P.matcher(str);
        if (!m.find()) { return 0; }

        char[] chars = m.group(1).toCharArray();
        int head = (chars[0] == '+' || chars[0] == '-')? 1:0;
        int signum = (chars[0] == '-')? -1:1;
        long result = 0l;
        long max = (long)Integer.MAX_VALUE;
        long min = (long)Integer.MIN_VALUE;
        for (int i = head; i < chars.length; i++) {
            result = (result * 10) + (((int)chars[i]-'0') * signum); // ascii码中 0 = 48
            if (result > 0 && result > max) { return Integer.MAX_VALUE; }
            if (result < 0 && result < min) { return Integer.MIN_VALUE; }
        }
        return (int)result;
    }
}

结果

没想到结果反而没有第一种用映射表的块。 string-to-integer-2

直接利用Integer.parseInt()

虽然Java的Integer.parseInt()atoi()的约定不同。但还是可以用正则表达式帮我们过滤掉格式不对的情况。最后overflow的情况,用try-catch块处理一下就行。

代码

import java.util.regex.*;

public class Solution {
    private static final Pattern P = Pattern.compile("^[\\s]*([+-]?[0-9]+).*$");

    public int myAtoi(String str) {
        if (str == null || str.isEmpty()) { return 0; }
        Matcher m = P.matcher(str);
        if (!m.find()) { return 0; }

        String num = m.group(1);
        int signum = (num.charAt(0) == '-')? -1:1;
        try {
            return Integer.parseInt(num); // 让 Integer.parseInt()替我们工作
        } catch (NumberFormatException e) { // 溢出时Integer.parseInt()会抛出异常,这里额外处理一下
            if (signum == 1) {
                return Integer.MAX_VALUE;
            } else {
                return Integer.MIN_VALUE;
            }
        }
    }
}

结果

还是不好。见了鬼了。难道是机器累了吗? string-to-integer-3

不用正则表达式,直接靠规则过滤

根据atoi的约定,先跳过开头的whitespace,再处理正负号,最后检查overflow

代码

public class Solution {
    public int myAtoi(String str) {
        if (str == null || str.isEmpty()) { return 0; }
        str = str.trim(); // discards whitespace
        char[] chars = str.toCharArray();
        int signum = 1, head = 0;
        if (chars[head] == '+' || chars[head] == '-') { // treat sign
            if (chars[head] == '-') { signum = -1; }
            head++;
        }
        // accumulate
        long result = 0l;
        long max = (long)Integer.MAX_VALUE;
        long min = (long)Integer.MIN_VALUE;
        while (head < chars.length && Character.isDigit(chars[head])) {
            result = (result * 10) + (((int)chars[head++]-'0') * signum); // char '0' = int 48, in ascii
            if (result > 0 && result > max) { result = max; break; }
            if (result < 0 && result < min) { result = min; break; }
        }
        return (int)result;
    }
}

结果

结果还是不好。可能正则表达式不是性能不好的主要原因。 string-to-integer-4